Skip to content

Conversation

@wking
Copy link
Member

@wking wking commented Dec 1, 2020

Our graph has some nodes without tasks, e.g. nodes which create a choke point between two sets of parallel nodes. Ideally we give all nodes descriptive names (#435), but until we have that, require at least one entry in Tasks before we store a node as firstIncompleteNode.
This avoids the chance of panicking during the subsequent:

fmt.Errorf("%d incomplete task nodes, beginning with %s", incompleteCount, firstIncompleteNode.Tasks[0])

Our graph has some nodes without tasks, e.g. nodes which create a
choke point between two sets of parallel nodes.  Ideally we give all
nodes descriptive names [1], but until we have that, require at least
one entry in Tasks before we store a node as firstIncompleteNode.
This avoids the chance of panicking during the subsequent [2]:

  fmt.Errorf("%d incomplete task nodes, beginning with %s", incompleteCount, firstIncompleteNode.Tasks[0])

[1]: openshift#435
[2]: https://bugzilla.redhat.com/show_bug.cgi?id=1903382
@openshift-ci-robot openshift-ci-robot added the bugzilla/severity-medium Referenced Bugzilla bug's severity is medium for the branch this PR is targeting. label Dec 1, 2020
@openshift-ci-robot
Copy link
Contributor

@wking: This pull request references Bugzilla bug 1903382, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.7.0) matches configured target release for branch (4.7.0)
  • bug is in the state NEW, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)
Details

In response to this:

Bug 1903382: pkg/payload/task_graph: Require firstIncompleteNode to have tasks

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. label Dec 1, 2020
@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 1, 2020
for i, result := range results {
if result == nil {
if firstIncompleteNode == nil {
if firstIncompleteNode == nil && len(graph.Nodes[i].Tasks) > 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you do this, the check on line 548 needs to consider the case where firstIncompleteNode is nil and the incompleteCount is greater than zero.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line 548 already has a guard for firstIncompleteNode != nil. If incompleteCount is greater than zero, and firstIncompleteNode is nil, I'm not sure we care, since no meaningful work was left on the table. But I can check nextNode to make sure...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am still not sure why the Task i.e. .Tasks[0] is nil.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but you don't produce the error in that case

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am still not sure why the Task i.e. .Tasks[0] is nil.

We have empty placeholder nodes in the graph, like:

Screenshot 2021-02-01 at 15 11 34

But I can check nextNode to make sure...

I'm pretty sure we're fine with the change I'm proposing, without needing to involve nextNode. Cases:

  • All result-less nodes have no tasks. Then with my change I will never set firstIncompleteNode, and we will not inject an incomplete task nodes, beginning with... error. But there were no outstanding tasks, so not injecting that error is an improvement.
  • The first result-less node has at least one task. Then that matches my guard, and we get exactly the same non-panicky firstIncompleteNode handling as we have in master today.
  • The first result-less node has no tasks, but some later result-less node has at least one task. Then that later node will match my guard, and we'll mention it with a incomplete task nodes, beginning with... error instead of panicking (where we panic today).

@jottofar
Copy link
Contributor

jottofar commented Dec 2, 2020

Why is it preferred to change the selection of firstIncompleteNode rather than checking the Tasks length in the fmt.Errorf and indicating the firstIncompleteNode has no tasks? It's still the first incomplete node.

@wking
Copy link
Member Author

wking commented Dec 2, 2020

It's still the first incomplete node...

But without some of the node-naming work from #435, we can't say "the empty node between $GROUP_A and $GROUP_B was canceled", and have to say "some empty node was canceled; no idea how far we got through the graph or what real work was left uncovered". It seems more useful to try to step forward and figure out what the real work is that we're not getting to.

@openshift-merge-robot
Copy link
Contributor

@wking: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/integration 6b1470b link /test integration
ci/prow/e2e-agnostic 6b1470b link /test e2e-agnostic
ci/prow/e2e-agnostic-operator 6b1470b link /test e2e-agnostic-operator

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@jottofar
Copy link
Contributor

jottofar commented Feb 1, 2021

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Feb 1, 2021
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jottofar, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@wking
Copy link
Member Author

wking commented Feb 1, 2021

/retest

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

3 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit 6c55d1e into openshift:master Feb 2, 2021
@openshift-ci-robot
Copy link
Contributor

@wking: All pull requests linked via external trackers have merged:

Bugzilla bug 1903382 has been moved to the MODIFIED state.

Details

In response to this:

Bug 1903382: pkg/payload/task_graph: Require firstIncompleteNode to have tasks

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@wking wking deleted the taskless-node-panic branch February 2, 2021 18:34
@wking
Copy link
Member Author

wking commented Feb 2, 2021

/cherrypick release-4.6

@openshift-cherrypick-robot

@wking: new pull request created: #510

Details

In response to this:

/cherrypick release-4.6

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-medium Referenced Bugzilla bug's severity is medium for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants